Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 17050 |
| Missing cells | 57097 |
| Missing cells (%) | 22.3% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 2.0 MiB |
| Average record size in memory | 120.0 B |
Variable types
| Numeric | 2 |
|---|---|
| Text | 9 |
| Unsupported | 2 |
| Categorical | 2 |
Binding has constant value "" | Constant |
TCR_name is highly overall correlated with task | High correlation |
Unnamed: 0 is highly overall correlated with task | High correlation |
task is highly overall correlated with TCR_name and 1 other fields | High correlation |
TRAV has 1415 (8.3%) missing values | Missing |
TRAJ has 1994 (11.7%) missing values | Missing |
TRBV has 1259 (7.4%) missing values | Missing |
TRBJ has 1483 (8.7%) missing values | Missing |
TRAC has 17050 (100.0%) missing values | Missing |
TRBC has 17050 (100.0%) missing values | Missing |
MHC A has 1276 (7.5%) missing values | Missing |
MHC B has 15570 (91.3%) missing values | Missing |
Unnamed: 0 is uniformly distributed | Uniform |
Unnamed: 0 has unique values | Unique |
TCR_name has unique values | Unique |
TRAC is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
TRBC is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Reproduction
| Analysis started | 2024-04-18 08:20:19.811901 |
|---|---|
| Analysis finished | 2024-04-18 08:20:21.956461 |
| Duration | 2.14 seconds |
| Software version | ydata-profiling v0.0.dev0 |
| Download configuration | config.json |
Unnamed: 0
Real number (ℝ)
HIGH CORRELATION  UNIFORM  UNIQUE 
| Distinct | 17050 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8524.5 |
| Minimum | 0 |
|---|---|
| Maximum | 17049 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 133.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 852.45 |
| Q1 | 4262.25 |
| median | 8524.5 |
| Q3 | 12786.75 |
| 95-th percentile | 16196.55 |
| Maximum | 17049 |
| Range | 17049 |
| Interquartile range (IQR) | 8524.5 |
Descriptive statistics
| Standard deviation | 4922.0554 |
|---|---|
| Coefficient of variation (CV) | 0.57740107 |
| Kurtosis | -1.2 |
| Mean | 8524.5 |
| Median Absolute Deviation (MAD) | 4262.5 |
| Skewness | 0 |
| Sum | 1.4534272 × 108 |
| Variance | 24226629 |
| Monotonicity | Strictly increasing |
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 11371 | 1 | < 0.1% |
| 11357 | 1 | < 0.1% |
| 11358 | 1 | < 0.1% |
| 11359 | 1 | < 0.1% |
| 11360 | 1 | < 0.1% |
| 11361 | 1 | < 0.1% |
| 11362 | 1 | < 0.1% |
| 11363 | 1 | < 0.1% |
| 11364 | 1 | < 0.1% |
| Other values (17040) | 17040 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 17049 | 1 | |
| 17048 | 1 | |
| 17047 | 1 | |
| 17046 | 1 | |
| 17045 | 1 | |
| 17044 | 1 | |
| 17043 | 1 | |
| 17042 | 1 | |
| 17041 | 1 | |
| 17040 | 1 |
TCR_name
Real number (ℝ)
HIGH CORRELATION  UNIQUE 
| Distinct | 17050 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 20136.679 |
| Minimum | 1 |
|---|---|
| Maximum | 56833 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 133.3 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 927.45 |
| Q1 | 4385.25 |
| median | 11449.5 |
| Q3 | 32285.75 |
| 95-th percentile | 55365.55 |
| Maximum | 56833 |
| Range | 56832 |
| Interquartile range (IQR) | 27900.5 |
Descriptive statistics
| Standard deviation | 19032.705 |
|---|---|
| Coefficient of variation (CV) | 0.94517596 |
| Kurtosis | -0.87959122 |
| Mean | 20136.679 |
| Median Absolute Deviation (MAD) | 9542 |
| Skewness | 0.76773102 |
| Sum | 3.4333037 × 108 |
| Variance | 3.6224384 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 4 | 1 | < 0.1% |
| 55398 | 1 | < 0.1% |
| 55384 | 1 | < 0.1% |
| 55385 | 1 | < 0.1% |
| 55386 | 1 | < 0.1% |
| 55387 | 1 | < 0.1% |
| 55388 | 1 | < 0.1% |
| 55389 | 1 | < 0.1% |
| 55390 | 1 | < 0.1% |
| 55391 | 1 | < 0.1% |
| Other values (17040) | 17040 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 13 | 1 | |
| 14 | 1 | |
| 15 | 1 |
| Value | Count | Frequency (%) |
| 56833 | 1 | |
| 56832 | 1 | |
| 56829 | 1 | |
| 56828 | 1 | |
| 56827 | 1 | |
| 56826 | 1 | |
| 56825 | 1 | |
| 56824 | 1 | |
| 56823 | 1 | |
| 56822 | 1 |
TRAV
Text
MISSING 
| Distinct | 241 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 1415 |
| Missing (%) | 8.3% |
| Memory size | 133.3 KiB |
Length
| Max length | 20 |
|---|---|
| Median length | 17 |
| Mean length | 9.4722098 |
| Min length | 5 |
Characters and Unicode
| Total characters | 148098 |
|---|---|
| Distinct characters | 27 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 62 ? |
|---|---|
| Unique (%) | 0.4% |
Sample
| 1st row | TRAV38-2/DV8*01 |
|---|---|
| 2nd row | TRAV38-1*01 |
| 3rd row | TRAV12-2*01 |
| 4th row | TRAV12-2*01 |
| 5th row | TRAV12-2*01 |
| Value | Count | Frequency (%) |
| trav12-2*01 | 1018 | 6.5% |
| trav12-2 | 908 | 5.8% |
| trav13-1*01 | 626 | 4.0% |
| trav19*01 | 600 | 3.8% |
| trav27*01 | 547 | 3.5% |
| trav21*01 | 509 | 3.3% |
| trav1-2*01 | 495 | 3.2% |
| trav29/dv5*01 | 475 | 3.0% |
| trav14/dv4*01 | 466 | 3.0% |
| trav5*01 | 404 | 2.6% |
| Other values (229) | 9592 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 21971 | |
| V | 17320 | |
| T | 15628 | |
| R | 15628 | |
| A | 15621 | |
| 0 | 12541 | |
| * | 11960 | |
| 2 | 10776 | |
| - | 7597 | 5.1% |
| 3 | 4367 | 2.9% |
| Other values (17) | 14689 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 148098 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 21971 | |
| V | 17320 | |
| T | 15628 | |
| R | 15628 | |
| A | 15621 | |
| 0 | 12541 | |
| * | 11960 | |
| 2 | 10776 | |
| - | 7597 | 5.1% |
| 3 | 4367 | 2.9% |
| Other values (17) | 14689 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 148098 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 21971 | |
| V | 17320 | |
| T | 15628 | |
| R | 15628 | |
| A | 15621 | |
| 0 | 12541 | |
| * | 11960 | |
| 2 | 10776 | |
| - | 7597 | 5.1% |
| 3 | 4367 | 2.9% |
| Other values (17) | 14689 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 148098 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 21971 | |
| V | 17320 | |
| T | 15628 | |
| R | 15628 | |
| A | 15621 | |
| 0 | 12541 | |
| * | 11960 | |
| 2 | 10776 | |
| - | 7597 | 5.1% |
| 3 | 4367 | 2.9% |
| Other values (17) | 14689 |
TRAJ
Text
MISSING 
| Distinct | 178 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 1994 |
| Missing (%) | 11.7% |
| Memory size | 133.3 KiB |
Length
| Max length | 11 |
|---|---|
| Median length | 9 |
| Mean length | 8.2304729 |
| Min length | 5 |
Characters and Unicode
| Total characters | 123918 |
|---|---|
| Distinct characters | 19 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 33 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | TRAJ40*01 |
|---|---|
| 2nd row | TRAJ48*01 |
| 3rd row | TRAJ42*01 |
| 4th row | TRAJ48*01 |
| 5th row | TRAJ42*01 |
| Value | Count | Frequency (%) |
| traj42*01 | 1142 | 7.6% |
| traj52*01 | 474 | 3.1% |
| traj33*01 | 436 | 2.9% |
| traj45*01 | 432 | 2.9% |
| traj20*01 | 396 | 2.6% |
| traj49*01 | 376 | 2.5% |
| traj42 | 372 | 2.5% |
| traj37*01 | 350 | 2.3% |
| traj30*01 | 347 | 2.3% |
| traj50*01 | 342 | 2.3% |
| Other values (165) | 10389 |
Most occurring characters
| Value | Count | Frequency (%) |
| T | 15056 | |
| A | 15056 | |
| R | 15056 | |
| J | 15054 | |
| 1 | 14224 | |
| 0 | 13501 | |
| * | 11621 | |
| 4 | 5753 | 4.6% |
| 2 | 5345 | 4.3% |
| 3 | 5311 | 4.3% |
| Other values (9) | 7941 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 123918 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| T | 15056 | |
| A | 15056 | |
| R | 15056 | |
| J | 15054 | |
| 1 | 14224 | |
| 0 | 13501 | |
| * | 11621 | |
| 4 | 5753 | 4.6% |
| 2 | 5345 | 4.3% |
| 3 | 5311 | 4.3% |
| Other values (9) | 7941 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 123918 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| T | 15056 | |
| A | 15056 | |
| R | 15056 | |
| J | 15054 | |
| 1 | 14224 | |
| 0 | 13501 | |
| * | 11621 | |
| 4 | 5753 | 4.6% |
| 2 | 5345 | 4.3% |
| 3 | 5311 | 4.3% |
| Other values (9) | 7941 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 123918 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| T | 15056 | |
| A | 15056 | |
| R | 15056 | |
| J | 15054 | |
| 1 | 14224 | |
| 0 | 13501 | |
| * | 11621 | |
| 4 | 5753 | 4.6% |
| 2 | 5345 | 4.3% |
| 3 | 5311 | 4.3% |
| Other values (9) | 7941 |
TRA_CDR3
Text
| Distinct | 12741 |
|---|---|
| Distinct (%) | 74.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 133.3 KiB |
Length
| Max length | 30 |
|---|---|
| Median length | 26 |
| Mean length | 13.402346 |
| Min length | 3 |
Characters and Unicode
| Total characters | 228510 |
|---|---|
| Distinct characters | 25 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 11206 ? |
|---|---|
| Unique (%) | 65.7% |
Sample
| 1st row | CAYRPPGTYKYIF |
|---|---|
| 2nd row | CAYTVLGNEKLTF |
| 3rd row | CAVAGYGGSQGNLIF |
| 4th row | CAVSFGNEKLTF |
| 5th row | CAVTHYGGSQGNLIF |
| Value | Count | Frequency (%) |
| caglnyggsqgnlif | 102 | 0.6% |
| caasetsydkvif | 97 | 0.6% |
| cagqnyggsqgnlif | 95 | 0.6% |
| cadsgggadgltf | 80 | 0.5% |
| cagmnyggsqgnlif | 77 | 0.5% |
| caigpgnmltf | 72 | 0.4% |
| cagggsqgnlif | 71 | 0.4% |
| cavdlmktsydkvif | 71 | 0.4% |
| cavgdnfnkfyf | 39 | 0.2% |
| cagagsqgnlif | 35 | 0.2% |
| Other values (12731) | 16311 |
Most occurring characters
| Value | Count | Frequency (%) |
| G | 29140 | |
| A | 24526 | |
| F | 19593 | 8.6% |
| L | 17353 | 7.6% |
| C | 16573 | 7.3% |
| S | 15848 | 6.9% |
| N | 14483 | 6.3% |
| T | 13452 | 5.9% |
| V | 11731 | 5.1% |
| K | 10240 | 4.5% |
| Other values (15) | 55571 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 228510 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| G | 29140 | |
| A | 24526 | |
| F | 19593 | 8.6% |
| L | 17353 | 7.6% |
| C | 16573 | 7.3% |
| S | 15848 | 6.9% |
| N | 14483 | 6.3% |
| T | 13452 | 5.9% |
| V | 11731 | 5.1% |
| K | 10240 | 4.5% |
| Other values (15) | 55571 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 228510 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| G | 29140 | |
| A | 24526 | |
| F | 19593 | 8.6% |
| L | 17353 | 7.6% |
| C | 16573 | 7.3% |
| S | 15848 | 6.9% |
| N | 14483 | 6.3% |
| T | 13452 | 5.9% |
| V | 11731 | 5.1% |
| K | 10240 | 4.5% |
| Other values (15) | 55571 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 228510 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| G | 29140 | |
| A | 24526 | |
| F | 19593 | 8.6% |
| L | 17353 | 7.6% |
| C | 16573 | 7.3% |
| S | 15848 | 6.9% |
| N | 14483 | 6.3% |
| T | 13452 | 5.9% |
| V | 11731 | 5.1% |
| K | 10240 | 4.5% |
| Other values (15) | 55571 |
TRBV
Text
MISSING 
| Distinct | 225 |
|---|---|
| Distinct (%) | 1.4% |
| Missing | 1259 |
| Missing (%) | 7.4% |
| Memory size | 133.3 KiB |
Length
| Max length | 20 |
|---|---|
| Median length | 19 |
| Mean length | 9.0155152 |
| Min length | 5 |
Characters and Unicode
| Total characters | 142364 |
|---|---|
| Distinct characters | 22 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 52 ? |
|---|---|
| Unique (%) | 0.3% |
Sample
| 1st row | TRBV14*01 |
|---|---|
| 2nd row | TRBV28*01 |
| 3rd row | TRBV28*01 |
| 4th row | TRBV28*01 |
| 5th row | TRBV28*01 |
| Value | Count | Frequency (%) |
| trbv19*01 | 1594 | 10.1% |
| trbv20-1*01 | 683 | 4.3% |
| trbv27*01 | 629 | 4.0% |
| trbv7-9*01 | 625 | 4.0% |
| trbv9*01 | 573 | 3.6% |
| trbv11-2*01 | 432 | 2.7% |
| trbv4-1*01 | 408 | 2.6% |
| trbv28*01 | 381 | 2.4% |
| trbv6-5*01 | 378 | 2.4% |
| trbv2*01 | 332 | 2.1% |
| Other values (213) | 9757 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 20733 | |
| R | 15792 | |
| T | 15791 | |
| B | 15791 | |
| V | 15791 | |
| 0 | 13507 | |
| * | 11977 | |
| - | 9528 | |
| 2 | 6032 | 4.2% |
| 9 | 3993 | 2.8% |
| Other values (12) | 13429 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 142364 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 20733 | |
| R | 15792 | |
| T | 15791 | |
| B | 15791 | |
| V | 15791 | |
| 0 | 13507 | |
| * | 11977 | |
| - | 9528 | |
| 2 | 6032 | 4.2% |
| 9 | 3993 | 2.8% |
| Other values (12) | 13429 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 142364 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 20733 | |
| R | 15792 | |
| T | 15791 | |
| B | 15791 | |
| V | 15791 | |
| 0 | 13507 | |
| * | 11977 | |
| - | 9528 | |
| 2 | 6032 | 4.2% |
| 9 | 3993 | 2.8% |
| Other values (12) | 13429 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 142364 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 20733 | |
| R | 15792 | |
| T | 15791 | |
| B | 15791 | |
| V | 15791 | |
| 0 | 13507 | |
| * | 11977 | |
| - | 9528 | |
| 2 | 6032 | 4.2% |
| 9 | 3993 | 2.8% |
| Other values (12) | 13429 |
TRBJ
Text
MISSING 
| Distinct | 60 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 1483 |
| Missing (%) | 8.7% |
| Memory size | 133.3 KiB |
Length
| Max length | 12 |
|---|---|
| Median length | 10 |
| Mean length | 9.2960108 |
| Min length | 5 |
Characters and Unicode
| Total characters | 144711 |
|---|---|
| Distinct characters | 17 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 8 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | TRBJ2-1*01 |
|---|---|
| 2nd row | TRBJ2-1*01 |
| 3rd row | TRBJ1-1*01 |
| 4th row | TRBJ1-5*01 |
| 5th row | TRBJ2-3*01 |
| Value | Count | Frequency (%) |
| trbj2-7*01 | 2273 | |
| trbj2-1*01 | 1740 | |
| trbj1-2*01 | 1420 | 9.1% |
| trbj2-3*01 | 1371 | 8.8% |
| trbj1-1*01 | 1284 | 8.2% |
| trbj2-2*01 | 1063 | 6.8% |
| trbj2-5*01 | 803 | 5.2% |
| trbj1-5*01 | 732 | 4.7% |
| trbj2-7 | 613 | 3.9% |
| trbj1-2 | 537 | 3.4% |
| Other values (46) | 3731 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 21592 | |
| T | 15567 | |
| R | 15567 | |
| B | 15567 | |
| J | 15567 | |
| - | 15354 | |
| 2 | 13199 | |
| 0 | 11970 | |
| * | 11970 | |
| 7 | 2925 | 2.0% |
| Other values (7) | 5433 | 3.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 144711 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 21592 | |
| T | 15567 | |
| R | 15567 | |
| B | 15567 | |
| J | 15567 | |
| - | 15354 | |
| 2 | 13199 | |
| 0 | 11970 | |
| * | 11970 | |
| 7 | 2925 | 2.0% |
| Other values (7) | 5433 | 3.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 144711 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 21592 | |
| T | 15567 | |
| R | 15567 | |
| B | 15567 | |
| J | 15567 | |
| - | 15354 | |
| 2 | 13199 | |
| 0 | 11970 | |
| * | 11970 | |
| 7 | 2925 | 2.0% |
| Other values (7) | 5433 | 3.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 144711 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 21592 | |
| T | 15567 | |
| R | 15567 | |
| B | 15567 | |
| J | 15567 | |
| - | 15354 | |
| 2 | 13199 | |
| 0 | 11970 | |
| * | 11970 | |
| 7 | 2925 | 2.0% |
| Other values (7) | 5433 | 3.8% |
TRB_CDR3
Text
| Distinct | 13631 |
|---|---|
| Distinct (%) | 79.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 133.3 KiB |
Length
| Max length | 29 |
|---|---|
| Median length | 25 |
| Mean length | 14.249795 |
| Min length | 4 |
Characters and Unicode
| Total characters | 242959 |
|---|---|
| Distinct characters | 21 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 12346 ? |
|---|---|
| Unique (%) | 72.4% |
Sample
| 1st row | CASSALASLNEQFF |
|---|---|
| 2nd row | CASSFTPYNEQFF |
| 3rd row | CASSPQGLGTEAFF |
| 4th row | CAEGQGFVGQPQHF |
| 5th row | CASLRSAVWADTQYF |
| Value | Count | Frequency (%) |
| cassirssyeqyf | 189 | 1.1% |
| casswgggshygytf | 146 | 0.9% |
| cassfsgntgelff | 97 | 0.6% |
| casslrdgseaff | 86 | 0.5% |
| cassirsayeqyf | 42 | 0.2% |
| csvdleanygytf | 31 | 0.2% |
| cassirstdtqyf | 28 | 0.2% |
| cassarssyeqyf | 27 | 0.2% |
| cassqrpsevgelff | 26 | 0.2% |
| cassfpgqgntqyf | 26 | 0.2% |
| Other values (13621) | 16352 |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 39769 | |
| G | 25361 | |
| A | 25183 | |
| F | 23629 | |
| Y | 15999 | 6.6% |
| T | 15836 | 6.5% |
| C | 15813 | 6.5% |
| Q | 15569 | 6.4% |
| E | 13635 | 5.6% |
| L | 9990 | 4.1% |
| Other values (11) | 42175 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 242959 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| S | 39769 | |
| G | 25361 | |
| A | 25183 | |
| F | 23629 | |
| Y | 15999 | 6.6% |
| T | 15836 | 6.5% |
| C | 15813 | 6.5% |
| Q | 15569 | 6.4% |
| E | 13635 | 5.6% |
| L | 9990 | 4.1% |
| Other values (11) | 42175 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 242959 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| S | 39769 | |
| G | 25361 | |
| A | 25183 | |
| F | 23629 | |
| Y | 15999 | 6.6% |
| T | 15836 | 6.5% |
| C | 15813 | 6.5% |
| Q | 15569 | 6.4% |
| E | 13635 | 5.6% |
| L | 9990 | 4.1% |
| Other values (11) | 42175 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 242959 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| S | 39769 | |
| G | 25361 | |
| A | 25183 | |
| F | 23629 | |
| Y | 15999 | 6.6% |
| T | 15836 | 6.5% |
| C | 15813 | 6.5% |
| Q | 15569 | 6.4% |
| E | 13635 | 5.6% |
| L | 9990 | 4.1% |
| Other values (11) | 42175 |
TRAC
Unsupported
MISSING  REJECTED  UNSUPPORTED 
| Missing | 17050 |
|---|---|
| Missing (%) | 100.0% |
| Memory size | 133.3 KiB |
TRBC
Unsupported
MISSING  REJECTED  UNSUPPORTED 
| Missing | 17050 |
|---|---|
| Missing (%) | 100.0% |
| Memory size | 133.3 KiB |
Epitope
Text
| Distinct | 576 |
|---|---|
| Distinct (%) | 3.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 133.3 KiB |
Length
| Max length | 25 |
|---|---|
| Median length | 9 |
| Mean length | 9.667566 |
| Min length | 8 |
Characters and Unicode
| Total characters | 164832 |
|---|---|
| Distinct characters | 20 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 259 ? |
|---|---|
| Unique (%) | 1.5% |
Sample
| 1st row | FLKEKGGL |
|---|---|
| 2nd row | ELAGIGILTV |
| 3rd row | ELAGIGILTV |
| 4th row | ELAGIGILTV |
| 5th row | ELAGIGILTV |
| Value | Count | Frequency (%) |
| klggalqak | 5440 | |
| gilgfvftl | 1650 | 9.7% |
| rakfkqll | 953 | 5.6% |
| avfdrksdak | 756 | 4.4% |
| llwngpmav | 644 | 3.8% |
| tfeyvsqpflmdle | 561 | 3.3% |
| ivtdfsvik | 512 | 3.0% |
| llldrlnql | 492 | 2.9% |
| nlvpmvatv | 480 | 2.8% |
| elagigiltv | 422 | 2.5% |
| Other values (566) | 5140 |
Most occurring characters
| Value | Count | Frequency (%) |
| L | 29199 | |
| A | 19508 | |
| G | 19161 | |
| K | 16766 | |
| V | 10287 | 6.2% |
| Q | 10172 | 6.2% |
| F | 9428 | 5.7% |
| T | 7253 | 4.4% |
| I | 6367 | 3.9% |
| P | 5642 | 3.4% |
| Other values (10) | 31049 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 164832 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| L | 29199 | |
| A | 19508 | |
| G | 19161 | |
| K | 16766 | |
| V | 10287 | 6.2% |
| Q | 10172 | 6.2% |
| F | 9428 | 5.7% |
| T | 7253 | 4.4% |
| I | 6367 | 3.9% |
| P | 5642 | 3.4% |
| Other values (10) | 31049 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 164832 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| L | 29199 | |
| A | 19508 | |
| G | 19161 | |
| K | 16766 | |
| V | 10287 | 6.2% |
| Q | 10172 | 6.2% |
| F | 9428 | 5.7% |
| T | 7253 | 4.4% |
| I | 6367 | 3.9% |
| P | 5642 | 3.4% |
| Other values (10) | 31049 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 164832 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| L | 29199 | |
| A | 19508 | |
| G | 19161 | |
| K | 16766 | |
| V | 10287 | 6.2% |
| Q | 10172 | 6.2% |
| F | 9428 | 5.7% |
| T | 7253 | 4.4% |
| I | 6367 | 3.9% |
| P | 5642 | 3.4% |
| Other values (10) | 31049 |
MHC A
Text
MISSING 
| Distinct | 102 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 1276 |
| Missing (%) | 7.5% |
| Memory size | 133.3 KiB |
Length
| Max length | 20 |
|---|---|
| Median length | 11 |
| Mean length | 11.034677 |
| Min length | 6 |
Characters and Unicode
| Total characters | 174061 |
|---|---|
| Distinct characters | 23 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 30 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | HLA-B*08 |
|---|---|
| 2nd row | HLA-A*02 |
| 3rd row | HLA-A*02 |
| 4th row | HLA-A*02 |
| 5th row | HLA-A*02 |
| Value | Count | Frequency (%) |
| hla-a*03:01 | 5667 | |
| hla-a*02:01 | 5058 | |
| hla-a*11:01 | 1310 | 8.3% |
| hla-b*08:01 | 989 | 6.3% |
| hla-b*07:02 | 394 | 2.5% |
| hla-a*02 | 329 | 2.1% |
| hla-a*01:01 | 298 | 1.9% |
| hla-a*24:02 | 280 | 1.8% |
| hla-dqa1*05:01 | 275 | 1.7% |
| hla-b*57:01 | 194 | 1.2% |
| Other values (92) | 980 | 6.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| A | 29522 | |
| 0 | 28818 | |
| 1 | 18342 | |
| H | 15774 | |
| L | 15774 | |
| - | 15774 | |
| * | 15631 | |
| : | 15327 | |
| 2 | 6654 | 3.8% |
| 3 | 5907 | 3.4% |
| Other values (13) | 6538 | 3.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 174061 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| A | 29522 | |
| 0 | 28818 | |
| 1 | 18342 | |
| H | 15774 | |
| L | 15774 | |
| - | 15774 | |
| * | 15631 | |
| : | 15327 | |
| 2 | 6654 | 3.8% |
| 3 | 5907 | 3.4% |
| Other values (13) | 6538 | 3.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 174061 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| A | 29522 | |
| 0 | 28818 | |
| 1 | 18342 | |
| H | 15774 | |
| L | 15774 | |
| - | 15774 | |
| * | 15631 | |
| : | 15327 | |
| 2 | 6654 | 3.8% |
| 3 | 5907 | 3.4% |
| Other values (13) | 6538 | 3.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 174061 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| A | 29522 | |
| 0 | 28818 | |
| 1 | 18342 | |
| H | 15774 | |
| L | 15774 | |
| - | 15774 | |
| * | 15631 | |
| : | 15327 | |
| 2 | 6654 | 3.8% |
| 3 | 5907 | 3.4% |
| Other values (13) | 6538 | 3.8% |
MHC B
Text
MISSING 
| Distinct | 54 |
|---|---|
| Distinct (%) | 3.6% |
| Missing | 15570 |
| Missing (%) | 91.3% |
| Memory size | 133.3 KiB |
Length
| Max length | 20 |
|---|---|
| Median length | 14 |
| Mean length | 12.982432 |
| Min length | 8 |
Characters and Unicode
| Total characters | 19214 |
|---|---|
| Distinct characters | 22 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 17 ? |
|---|---|
| Unique (%) | 1.1% |
Sample
| 1st row | HLA-DRB1*15:03 |
|---|---|
| 2nd row | HLA-DRB3*03:01 |
| 3rd row | HLA-DPB1*13:01 |
| 4th row | HLA-DRB5*01:01:01 |
| 5th row | HLA-DRB1*01:01:01 |
| Value | Count | Frequency (%) |
| hla-dpb1*04:01 | 403 | |
| hla-dqb1*06:02 | 229 | |
| hla-a*02 | 154 | 10.4% |
| hla-drb1*04:01 | 150 | 10.1% |
| hla-drb1*07:01 | 135 | 9.1% |
| hla-a*02:01 | 102 | 6.9% |
| hla-dqb1*02:01 | 60 | 4.1% |
| hla-a*24:02 | 33 | 2.2% |
| hla-drb1*14:02 | 25 | 1.7% |
| hla-drb1*15:01 | 20 | 1.4% |
| Other values (44) | 169 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 2662 | |
| 1 | 2191 | |
| A | 1777 | |
| H | 1480 | |
| - | 1480 | |
| L | 1480 | |
| * | 1476 | |
| : | 1337 | |
| B | 1178 | 6.1% |
| D | 1125 | 5.9% |
| Other values (12) | 3028 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 19214 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 2662 | |
| 1 | 2191 | |
| A | 1777 | |
| H | 1480 | |
| - | 1480 | |
| L | 1480 | |
| * | 1476 | |
| : | 1337 | |
| B | 1178 | 6.1% |
| D | 1125 | 5.9% |
| Other values (12) | 3028 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 19214 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 2662 | |
| 1 | 2191 | |
| A | 1777 | |
| H | 1480 | |
| - | 1480 | |
| L | 1480 | |
| * | 1476 | |
| : | 1337 | |
| B | 1178 | 6.1% |
| D | 1125 | 5.9% |
| Other values (12) | 3028 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 19214 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 2662 | |
| 1 | 2191 | |
| A | 1777 | |
| H | 1480 | |
| - | 1480 | |
| L | 1480 | |
| * | 1476 | |
| : | 1337 | |
| B | 1178 | 6.1% |
| D | 1125 | 5.9% |
| Other values (12) | 3028 |
Binding
Categorical
CONSTANT 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 133.3 KiB |
| 1 |
|---|
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 17050 |
|---|---|
| Distinct characters | 1 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 17050 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 17050 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 17050 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 17050 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 17050 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 17050 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 17050 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 17050 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 17050 |
task
Categorical
HIGH CORRELATION 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 133.3 KiB |
| TPP2 | |
|---|---|
| TPP1 | |
| TPP3 | 1092 |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Characters and Unicode
| Total characters | 68200 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | TPP2 |
|---|---|
| 2nd row | TPP2 |
| 3rd row | TPP2 |
| 4th row | TPP2 |
| 5th row | TPP2 |
Common Values
| Value | Count | Frequency (%) |
| TPP2 | 11101 | |
| TPP1 | 4857 | |
| TPP3 | 1092 | 6.4% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| tpp2 | 11101 | |
| tpp1 | 4857 | |
| tpp3 | 1092 | 6.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| P | 34100 | |
| T | 17050 | |
| 2 | 11101 | 16.3% |
| 1 | 4857 | 7.1% |
| 3 | 1092 | 1.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 68200 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| P | 34100 | |
| T | 17050 | |
| 2 | 11101 | 16.3% |
| 1 | 4857 | 7.1% |
| 3 | 1092 | 1.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 68200 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| P | 34100 | |
| T | 17050 | |
| 2 | 11101 | 16.3% |
| 1 | 4857 | 7.1% |
| 3 | 1092 | 1.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 68200 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| P | 34100 | |
| T | 17050 | |
| 2 | 11101 | 16.3% |
| 1 | 4857 | 7.1% |
| 3 | 1092 | 1.6% |
| TCR_name | Unnamed: 0 | task | |
|---|---|---|---|
| TCR_name | 1.000 | 0.100 | 0.518 |
| Unnamed: 0 | 0.100 | 1.000 | 0.711 |
| task | 0.518 | 0.711 | 1.000 |
| Unnamed: 0 | TCR_name | TRAV | TRAJ | TRA_CDR3 | TRBV | TRBJ | TRB_CDR3 | TRAC | TRBC | Epitope | MHC A | MHC B | Binding | task | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 4 | TRAV38-2/DV8*01 | TRAJ40*01 | CAYRPPGTYKYIF | TRBV14*01 | TRBJ2-1*01 | CASSALASLNEQFF | NaN | NaN | FLKEKGGL | HLA-B*08 | NaN | 1 | TPP2 |
| 1 | 1 | 14 | TRAV38-1*01 | TRAJ48*01 | CAYTVLGNEKLTF | TRBV28*01 | TRBJ2-1*01 | CASSFTPYNEQFF | NaN | NaN | ELAGIGILTV | HLA-A*02 | NaN | 1 | TPP2 |
| 2 | 2 | 15 | TRAV12-2*01 | TRAJ42*01 | CAVAGYGGSQGNLIF | TRBV28*01 | TRBJ1-1*01 | CASSPQGLGTEAFF | NaN | NaN | ELAGIGILTV | HLA-A*02 | NaN | 1 | TPP2 |
| 3 | 3 | 16 | TRAV12-2*01 | TRAJ48*01 | CAVSFGNEKLTF | TRBV28*01 | TRBJ1-5*01 | CAEGQGFVGQPQHF | NaN | NaN | ELAGIGILTV | HLA-A*02 | NaN | 1 | TPP2 |
| 4 | 4 | 17 | TRAV12-2*01 | TRAJ42*01 | CAVTHYGGSQGNLIF | TRBV28*01 | TRBJ2-3*01 | CASLRSAVWADTQYF | NaN | NaN | ELAGIGILTV | HLA-A*02 | NaN | 1 | TPP2 |
| 5 | 5 | 18 | TRAV12-2*01 | TRAJ45*01 | CAGGGGGADGLTF | TRBV28*01 | TRBJ1-5*01 | CASTLTGLGQPQHF | NaN | NaN | ELAGIGILTV | HLA-A*02 | NaN | 1 | TPP2 |
| 6 | 6 | 19 | TRAV12-2*01 | TRAJ23*01 | CAVTWGGKLIF | TRBV28*01 | TRBJ1-1*01 | CASSFQGLGTEAFF | NaN | NaN | ELAGIGILTV | HLA-A*02 | NaN | 1 | TPP2 |
| 7 | 7 | 20 | TRAV12-2*01 | NaN | CCAVSIGFGNVLHCGF | TRBV27*01 | TRBJ2-1*01 | CASSFNDEQFF | NaN | NaN | ELAGIGILTV | HLA-A*02 | NaN | 1 | TPP2 |
| 8 | 8 | 21 | TRAV12-2*01 | TRAJ31*01 | CAVNNARLMF | TRBV27*01 | TRBJ2-3*01 | CASSPSGLAGGHTQYF | NaN | NaN | ELAGIGILTV | HLA-A*02 | NaN | 1 | TPP2 |
| 9 | 9 | 22 | TRAV12-2*01 | NaN | CCAATIGFGNVLHCGF | TRBV27*01 | TRBJ2-1*01 | CASSMTSYNEQFF | NaN | NaN | ELAGIGILTV | HLA-A*02 | NaN | 1 | TPP2 |
| Unnamed: 0 | TCR_name | TRAV | TRAJ | TRA_CDR3 | TRBV | TRBJ | TRB_CDR3 | TRAC | TRBC | Epitope | MHC A | MHC B | Binding | task | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 17040 | 17040 | 7873 | TRAV14/DV4*01 | TRAJ9*01 | CAMREGENGGFKTIF | TRBV7-9*01 | TRBJ1-4*01 | CASSLVGGTDEKLFF | NaN | NaN | RAKFKQLL | HLA-B*08:01 | NaN | 1 | TPP1 |
| 17041 | 17041 | 7875 | TRAV8-2*01 | TRAJ4*01 | CVVSEAGGYNKLIF | TRBV5-1*01 | TRBJ1-1*01 | CASSLGSGWEAFF | NaN | NaN | KLGGALQAK | HLA-A*03:01 | NaN | 1 | TPP1 |
| 17042 | 17042 | 7876 | TRAV13-2*01 | TRAJ52*01 | CAERVGAGGTSYGKLTF | TRBV6-3*01 | TRBJ2-2*01 | CASSYGFGGHNTGELFF | NaN | NaN | KLGGALQAK | HLA-A*03:01 | NaN | 1 | TPP1 |
| 17043 | 17043 | 7877 | TRAV8-6*01 | TRAJ10*01 | CAVSGWGLTGGGNKLTF | TRBV6-5*01 | TRBJ1-1*01 | CASTGPLNTEAFF | NaN | NaN | KLGGALQAK | HLA-A*03:01 | NaN | 1 | TPP1 |
| 17044 | 17044 | 7879 | TRAV22*01 | TRAJ37*01 | CAGSPSNTGKLIF | TRBV7-9*01 | TRBJ2-7*01 | CASSTSEGGLFYEQYF | NaN | NaN | GILGFVFTL | HLA-A*02:01 | NaN | 1 | TPP1 |
| 17045 | 17045 | 7881 | TRAV8-3*01 | TRAJ26*01 | CAVGARDYGQNFVF | TRBV7-3*01 | TRBJ2-2*01 | CASSLGTSGGTGELFF | NaN | NaN | RAKFKQLL | HLA-B*08:01 | NaN | 1 | TPP1 |
| 17046 | 17046 | 7884 | TRAV9-2*01 | TRAJ30*01 | CALLNRDDKIIF | TRBV5-1*01 | TRBJ1-1*01 | CASSYGTGENTEAFF | NaN | NaN | RAKFKQLL | HLA-B*08:01 | NaN | 1 | TPP1 |
| 17047 | 17047 | 7887 | TRAV19*01 | TRAJ17*01 | CALKLIKAAGNKLTF | TRBV4-1*01 | TRBJ1-2*01 | CASSTSTGTGYGYTF | NaN | NaN | RAKFKQLL | HLA-B*08:01 | NaN | 1 | TPP1 |
| 17048 | 17048 | 7888 | TRAV5*01 | TRAJ31*01 | CAEDNNARLMF | TRBV20-1*01 | TRBJ1-3*01 | CSARPQPVGNTIYF | NaN | NaN | GLCTLVAML | HLA-A*02:01 | NaN | 1 | TPP1 |
| 17049 | 17049 | 7890 | TRAV13-1*01 | TRAJ20*01 | CAASGYDYKLSF | TRBV5-6*01 | TRBJ1-1*01 | CASSLRDGSEAFF | NaN | NaN | RAKFKQLL | HLA-B*08:01 | NaN | 1 | TPP1 |